{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# HTML解析入门及准备URL生成连续技\n",
    "*  本周主要内容：全系统性解析URL参数信息，构建参数模版\n",
    "*  上周主要内容：HTML解析（parse HTML）\n",
    "*  _week07_\n",
    "*  电子讲义设计者：许智超\n",
    "\n",
    "\n",
    "-----\n",
    "## 本周内容及学习目标\n",
    "\n",
    "### 1. url解析\n",
    "\n",
    "* 猎聘PC版 liepin.com 取工作URL参数的牛肉\n",
    "* 如何生成一连串新URL以进一步爬取数据\n",
    "\n",
    "### 2. 翻页（学生尝试）\n",
    "\n",
    "mark> 如何有系统的把更多页数据(相同结构)作系统性爬取 </mark>\n",
    "\n",
    "为此，我们需要学习\n",
    "\n",
    "* 翻页：参数字典的拆解\n",
    "  * xpath\n",
    "  * 建构参数模板\n",
    "  * 建构参数字典\n",
    "* 翻页：系统性迭代\n",
    "  * robots.txt\n",
    "  * 频率及时间\n",
    "* 翻页：数据备份与整合\n",
    "  * 储存备份\n",
    "  * 数据整合\n",
    "  \n",
    "## 目标\n",
    "1. 使用 requests-html 爬取并存取网页文字档，查找[requests-html 中文文档](https://cncert.github.io/requests-html-doc-cn/#/)\n",
    "2. 熟悉 [xpath 语法](https://www.w3cschool.cn/xpath/xpath-syntax.html)丶[xpath 节点](https://www.w3cschool.cn/xpath/xpath-nodes.html)\n",
    "3. 使用 [xpath cheatsheet](https://devhints.io/xpath)\n",
    "  * 在 Chrome Inspector 使用\n",
    "  * 在 requests-html (Python) 使用\n",
    "4. 简易使用 [pd.DataFrame](https://www.pypandas.cn/doc/getting_started/dsintro.html#dataframe)\n",
    "5. 参数字典的拆解与迭代\n",
    "6. 翻页数据备份与整合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 基本模块\n",
    "import pandas as pd\n",
    "from requests_html import HTMLSession"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 构建分类参数体系 系统性爬虫liepin"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 公司分类\n",
    "\n",
    "* 观察链接可否直接查看到参数变化？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://www.liepin.com/zhaopin/\"\n",
    "session = HTMLSession()\n",
    "r = session.get( url )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'//div[@class=\"wrap\"]//div[contains(@class,\"hot-comp-tags\")]/a'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "分类_xpath = '//div[@class=\"wrap\"]//div[contains(@class,\"hot-comp-tags\")]/a'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['https://www.liepin.com/zhaopin/?compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=180&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=0c2410a36507b09206766cd10bec35ea&d_curPage=0&d_pageSize=40&d_headId=0c2410a36507b09206766cd10bec35ea',\n",
       " 'https://www.liepin.com/zhaopin/?compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=188&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=0c2410a36507b09206766cd10bec35ea&d_curPage=0&d_pageSize=40&d_headId=0c2410a36507b09206766cd10bec35ea',\n",
       " 'https://www.liepin.com/zhaopin/?compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=185&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=0c2410a36507b09206766cd10bec35ea&d_curPage=0&d_pageSize=40&d_headId=0c2410a36507b09206766cd10bec35ea',\n",
       " 'https://www.liepin.com/zhaopin/?compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=183&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=0c2410a36507b09206766cd10bec35ea&d_curPage=0&d_pageSize=40&d_headId=0c2410a36507b09206766cd10bec35ea',\n",
       " 'https://www.liepin.com/zhaopin/?compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=176&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=0c2410a36507b09206766cd10bec35ea&d_curPage=0&d_pageSize=40&d_headId=0c2410a36507b09206766cd10bec35ea',\n",
       " 'https://www.liepin.com/zhaopin/?compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=130&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=0c2410a36507b09206766cd10bec35ea&d_curPage=0&d_pageSize=40&d_headId=0c2410a36507b09206766cd10bec35ea']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "公司分类 = [list(i.absolute_links)[0] for i in r.html.xpath(分类_xpath)]\n",
    "公司分类"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: '中国500强', 1: '互联网300强', 2: '创新企业100', 3: '强制造业500强', 4: '专精特新企业', 5: '独角兽'}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "公司分类_str = [\"中国500强\",\"互联网300强\",\"创新企业100\",\"强制造业500强\",\"专精特新企业\",\"独角兽\"]\n",
    "{k:i for k,i in enumerate(公司分类_str)}  #enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列，同时列出数据和数据下标，一般用在 for 循环当中."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'中国500强': ['compkind=',\n",
       "  'dqs=',\n",
       "  'pubTime=',\n",
       "  'pageSize=40',\n",
       "  'salary=',\n",
       "  'compTag=180',\n",
       "  'sortFlag=',\n",
       "  'compIds=',\n",
       "  'subIndustry=',\n",
       "  'jobKind=',\n",
       "  'industries=',\n",
       "  'compscale=',\n",
       "  'key=',\n",
       "  'siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       "  'd_sfrom=search_unknown',\n",
       "  'd_ckId=0c2410a36507b09206766cd10bec35ea',\n",
       "  'd_curPage=0',\n",
       "  'd_pageSize=40',\n",
       "  'd_headId=0c2410a36507b09206766cd10bec35ea'],\n",
       " '互联网300强': ['compkind=',\n",
       "  'dqs=',\n",
       "  'pubTime=',\n",
       "  'pageSize=40',\n",
       "  'salary=',\n",
       "  'compTag=188',\n",
       "  'sortFlag=',\n",
       "  'compIds=',\n",
       "  'subIndustry=',\n",
       "  'jobKind=',\n",
       "  'industries=',\n",
       "  'compscale=',\n",
       "  'key=',\n",
       "  'siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       "  'd_sfrom=search_unknown',\n",
       "  'd_ckId=0c2410a36507b09206766cd10bec35ea',\n",
       "  'd_curPage=0',\n",
       "  'd_pageSize=40',\n",
       "  'd_headId=0c2410a36507b09206766cd10bec35ea'],\n",
       " '创新企业100': ['compkind=',\n",
       "  'dqs=',\n",
       "  'pubTime=',\n",
       "  'pageSize=40',\n",
       "  'salary=',\n",
       "  'compTag=185',\n",
       "  'sortFlag=',\n",
       "  'compIds=',\n",
       "  'subIndustry=',\n",
       "  'jobKind=',\n",
       "  'industries=',\n",
       "  'compscale=',\n",
       "  'key=',\n",
       "  'siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       "  'd_sfrom=search_unknown',\n",
       "  'd_ckId=0c2410a36507b09206766cd10bec35ea',\n",
       "  'd_curPage=0',\n",
       "  'd_pageSize=40',\n",
       "  'd_headId=0c2410a36507b09206766cd10bec35ea'],\n",
       " '强制造业500强': ['compkind=',\n",
       "  'dqs=',\n",
       "  'pubTime=',\n",
       "  'pageSize=40',\n",
       "  'salary=',\n",
       "  'compTag=183',\n",
       "  'sortFlag=',\n",
       "  'compIds=',\n",
       "  'subIndustry=',\n",
       "  'jobKind=',\n",
       "  'industries=',\n",
       "  'compscale=',\n",
       "  'key=',\n",
       "  'siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       "  'd_sfrom=search_unknown',\n",
       "  'd_ckId=0c2410a36507b09206766cd10bec35ea',\n",
       "  'd_curPage=0',\n",
       "  'd_pageSize=40',\n",
       "  'd_headId=0c2410a36507b09206766cd10bec35ea'],\n",
       " '专精特新企业': ['compkind=',\n",
       "  'dqs=',\n",
       "  'pubTime=',\n",
       "  'pageSize=40',\n",
       "  'salary=',\n",
       "  'compTag=176',\n",
       "  'sortFlag=',\n",
       "  'compIds=',\n",
       "  'subIndustry=',\n",
       "  'jobKind=',\n",
       "  'industries=',\n",
       "  'compscale=',\n",
       "  'key=',\n",
       "  'siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       "  'd_sfrom=search_unknown',\n",
       "  'd_ckId=0c2410a36507b09206766cd10bec35ea',\n",
       "  'd_curPage=0',\n",
       "  'd_pageSize=40',\n",
       "  'd_headId=0c2410a36507b09206766cd10bec35ea'],\n",
       " '独角兽': ['compkind=',\n",
       "  'dqs=',\n",
       "  'pubTime=',\n",
       "  'pageSize=40',\n",
       "  'salary=',\n",
       "  'compTag=130',\n",
       "  'sortFlag=',\n",
       "  'compIds=',\n",
       "  'subIndustry=',\n",
       "  'jobKind=',\n",
       "  'industries=',\n",
       "  'compscale=',\n",
       "  'key=',\n",
       "  'siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       "  'd_sfrom=search_unknown',\n",
       "  'd_ckId=0c2410a36507b09206766cd10bec35ea',\n",
       "  'd_curPage=0',\n",
       "  'd_pageSize=40',\n",
       "  'd_headId=0c2410a36507b09206766cd10bec35ea']}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "公司分类_dict = {i:公司分类[k].split('?')[1].split('&') for k,i in enumerate(公司分类_str)}  #分割上方链接\n",
    "公司分类_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "公司分类_df = pd.DataFrame(公司分类_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>中国500强</th>\n",
       "      <th>互联网300强</th>\n",
       "      <th>创新企业100</th>\n",
       "      <th>强制造业500强</th>\n",
       "      <th>专精特新企业</th>\n",
       "      <th>独角兽</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>compkind=</td>\n",
       "      <td>compkind=</td>\n",
       "      <td>compkind=</td>\n",
       "      <td>compkind=</td>\n",
       "      <td>compkind=</td>\n",
       "      <td>compkind=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>dqs=</td>\n",
       "      <td>dqs=</td>\n",
       "      <td>dqs=</td>\n",
       "      <td>dqs=</td>\n",
       "      <td>dqs=</td>\n",
       "      <td>dqs=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>pubTime=</td>\n",
       "      <td>pubTime=</td>\n",
       "      <td>pubTime=</td>\n",
       "      <td>pubTime=</td>\n",
       "      <td>pubTime=</td>\n",
       "      <td>pubTime=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>pageSize=40</td>\n",
       "      <td>pageSize=40</td>\n",
       "      <td>pageSize=40</td>\n",
       "      <td>pageSize=40</td>\n",
       "      <td>pageSize=40</td>\n",
       "      <td>pageSize=40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>salary=</td>\n",
       "      <td>salary=</td>\n",
       "      <td>salary=</td>\n",
       "      <td>salary=</td>\n",
       "      <td>salary=</td>\n",
       "      <td>salary=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>compTag=180</td>\n",
       "      <td>compTag=188</td>\n",
       "      <td>compTag=185</td>\n",
       "      <td>compTag=183</td>\n",
       "      <td>compTag=176</td>\n",
       "      <td>compTag=130</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>sortFlag=</td>\n",
       "      <td>sortFlag=</td>\n",
       "      <td>sortFlag=</td>\n",
       "      <td>sortFlag=</td>\n",
       "      <td>sortFlag=</td>\n",
       "      <td>sortFlag=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>compIds=</td>\n",
       "      <td>compIds=</td>\n",
       "      <td>compIds=</td>\n",
       "      <td>compIds=</td>\n",
       "      <td>compIds=</td>\n",
       "      <td>compIds=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>subIndustry=</td>\n",
       "      <td>subIndustry=</td>\n",
       "      <td>subIndustry=</td>\n",
       "      <td>subIndustry=</td>\n",
       "      <td>subIndustry=</td>\n",
       "      <td>subIndustry=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>jobKind=</td>\n",
       "      <td>jobKind=</td>\n",
       "      <td>jobKind=</td>\n",
       "      <td>jobKind=</td>\n",
       "      <td>jobKind=</td>\n",
       "      <td>jobKind=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>industries=</td>\n",
       "      <td>industries=</td>\n",
       "      <td>industries=</td>\n",
       "      <td>industries=</td>\n",
       "      <td>industries=</td>\n",
       "      <td>industries=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>compscale=</td>\n",
       "      <td>compscale=</td>\n",
       "      <td>compscale=</td>\n",
       "      <td>compscale=</td>\n",
       "      <td>compscale=</td>\n",
       "      <td>compscale=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>key=</td>\n",
       "      <td>key=</td>\n",
       "      <td>key=</td>\n",
       "      <td>key=</td>\n",
       "      <td>key=</td>\n",
       "      <td>key=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...</td>\n",
       "      <td>siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...</td>\n",
       "      <td>siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...</td>\n",
       "      <td>siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...</td>\n",
       "      <td>siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...</td>\n",
       "      <td>siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>d_sfrom=search_unknown</td>\n",
       "      <td>d_sfrom=search_unknown</td>\n",
       "      <td>d_sfrom=search_unknown</td>\n",
       "      <td>d_sfrom=search_unknown</td>\n",
       "      <td>d_sfrom=search_unknown</td>\n",
       "      <td>d_sfrom=search_unknown</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>d_ckId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_ckId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_ckId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_ckId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_ckId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_ckId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>d_curPage=0</td>\n",
       "      <td>d_curPage=0</td>\n",
       "      <td>d_curPage=0</td>\n",
       "      <td>d_curPage=0</td>\n",
       "      <td>d_curPage=0</td>\n",
       "      <td>d_curPage=0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>d_pageSize=40</td>\n",
       "      <td>d_pageSize=40</td>\n",
       "      <td>d_pageSize=40</td>\n",
       "      <td>d_pageSize=40</td>\n",
       "      <td>d_pageSize=40</td>\n",
       "      <td>d_pageSize=40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>d_headId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_headId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_headId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_headId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_headId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_headId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               中国500强  \\\n",
       "0                                           compkind=   \n",
       "1                                                dqs=   \n",
       "2                                            pubTime=   \n",
       "3                                         pageSize=40   \n",
       "4                                             salary=   \n",
       "5                                         compTag=180   \n",
       "6                                           sortFlag=   \n",
       "7                                            compIds=   \n",
       "8                                        subIndustry=   \n",
       "9                                            jobKind=   \n",
       "10                                        industries=   \n",
       "11                                         compscale=   \n",
       "12                                               key=   \n",
       "13  siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...   \n",
       "14                             d_sfrom=search_unknown   \n",
       "15            d_ckId=0c2410a36507b09206766cd10bec35ea   \n",
       "16                                        d_curPage=0   \n",
       "17                                      d_pageSize=40   \n",
       "18          d_headId=0c2410a36507b09206766cd10bec35ea   \n",
       "\n",
       "                                              互联网300强  \\\n",
       "0                                           compkind=   \n",
       "1                                                dqs=   \n",
       "2                                            pubTime=   \n",
       "3                                         pageSize=40   \n",
       "4                                             salary=   \n",
       "5                                         compTag=188   \n",
       "6                                           sortFlag=   \n",
       "7                                            compIds=   \n",
       "8                                        subIndustry=   \n",
       "9                                            jobKind=   \n",
       "10                                        industries=   \n",
       "11                                         compscale=   \n",
       "12                                               key=   \n",
       "13  siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...   \n",
       "14                             d_sfrom=search_unknown   \n",
       "15            d_ckId=0c2410a36507b09206766cd10bec35ea   \n",
       "16                                        d_curPage=0   \n",
       "17                                      d_pageSize=40   \n",
       "18          d_headId=0c2410a36507b09206766cd10bec35ea   \n",
       "\n",
       "                                              创新企业100  \\\n",
       "0                                           compkind=   \n",
       "1                                                dqs=   \n",
       "2                                            pubTime=   \n",
       "3                                         pageSize=40   \n",
       "4                                             salary=   \n",
       "5                                         compTag=185   \n",
       "6                                           sortFlag=   \n",
       "7                                            compIds=   \n",
       "8                                        subIndustry=   \n",
       "9                                            jobKind=   \n",
       "10                                        industries=   \n",
       "11                                         compscale=   \n",
       "12                                               key=   \n",
       "13  siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...   \n",
       "14                             d_sfrom=search_unknown   \n",
       "15            d_ckId=0c2410a36507b09206766cd10bec35ea   \n",
       "16                                        d_curPage=0   \n",
       "17                                      d_pageSize=40   \n",
       "18          d_headId=0c2410a36507b09206766cd10bec35ea   \n",
       "\n",
       "                                             强制造业500强  \\\n",
       "0                                           compkind=   \n",
       "1                                                dqs=   \n",
       "2                                            pubTime=   \n",
       "3                                         pageSize=40   \n",
       "4                                             salary=   \n",
       "5                                         compTag=183   \n",
       "6                                           sortFlag=   \n",
       "7                                            compIds=   \n",
       "8                                        subIndustry=   \n",
       "9                                            jobKind=   \n",
       "10                                        industries=   \n",
       "11                                         compscale=   \n",
       "12                                               key=   \n",
       "13  siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...   \n",
       "14                             d_sfrom=search_unknown   \n",
       "15            d_ckId=0c2410a36507b09206766cd10bec35ea   \n",
       "16                                        d_curPage=0   \n",
       "17                                      d_pageSize=40   \n",
       "18          d_headId=0c2410a36507b09206766cd10bec35ea   \n",
       "\n",
       "                                               专精特新企业  \\\n",
       "0                                           compkind=   \n",
       "1                                                dqs=   \n",
       "2                                            pubTime=   \n",
       "3                                         pageSize=40   \n",
       "4                                             salary=   \n",
       "5                                         compTag=176   \n",
       "6                                           sortFlag=   \n",
       "7                                            compIds=   \n",
       "8                                        subIndustry=   \n",
       "9                                            jobKind=   \n",
       "10                                        industries=   \n",
       "11                                         compscale=   \n",
       "12                                               key=   \n",
       "13  siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...   \n",
       "14                             d_sfrom=search_unknown   \n",
       "15            d_ckId=0c2410a36507b09206766cd10bec35ea   \n",
       "16                                        d_curPage=0   \n",
       "17                                      d_pageSize=40   \n",
       "18          d_headId=0c2410a36507b09206766cd10bec35ea   \n",
       "\n",
       "                                                  独角兽  \n",
       "0                                           compkind=  \n",
       "1                                                dqs=  \n",
       "2                                            pubTime=  \n",
       "3                                         pageSize=40  \n",
       "4                                             salary=  \n",
       "5                                         compTag=130  \n",
       "6                                           sortFlag=  \n",
       "7                                            compIds=  \n",
       "8                                        subIndustry=  \n",
       "9                                            jobKind=  \n",
       "10                                        industries=  \n",
       "11                                         compscale=  \n",
       "12                                               key=  \n",
       "13  siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...  \n",
       "14                             d_sfrom=search_unknown  \n",
       "15            d_ckId=0c2410a36507b09206766cd10bec35ea  \n",
       "16                                        d_curPage=0  \n",
       "17                                      d_pageSize=40  \n",
       "18          d_headId=0c2410a36507b09206766cd10bec35ea  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "公司分类_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 观察得 ：公司分类参数为—compTag \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>中国500强</th>\n",
       "      <th>互联网300强</th>\n",
       "      <th>创新企业100</th>\n",
       "      <th>强制造业500强</th>\n",
       "      <th>专精特新企业</th>\n",
       "      <th>独角兽</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>compTag=180</td>\n",
       "      <td>compTag=188</td>\n",
       "      <td>compTag=185</td>\n",
       "      <td>compTag=183</td>\n",
       "      <td>compTag=176</td>\n",
       "      <td>compTag=130</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        中国500强      互联网300强      创新企业100     强制造业500强       专精特新企业  \\\n",
       "5  compTag=180  compTag=188  compTag=185  compTag=183  compTag=176   \n",
       "\n",
       "           独角兽  \n",
       "5  compTag=130  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "公司分类_df[5:6]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['compTag=180',\n",
       " 'compTag=188',\n",
       " 'compTag=185',\n",
       " 'compTag=183',\n",
       " 'compTag=176',\n",
       " 'compTag=130']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compTag_list = 公司分类_df[5:6].T[5].to_list()\n",
    "compTag_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['中国500强', '互联网300强', '创新企业100', '强制造业500强', '专精特新企业', '独角兽']"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "公司分类_名称 = 公司分类_df.columns.to_list()\n",
    "公司分类_名称"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'中国500强': '180',\n",
       " '互联网300强': '188',\n",
       " '创新企业100': '185',\n",
       " '强制造业500强': '183',\n",
       " '专精特新企业': '176',\n",
       " '独角兽': '130'}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compTag_参数构建 = {公司分类_名称[i]:compTag_list[i].split('=')[1] for i in range(len(compTag_list))}\n",
    "compTag_参数构建"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 增加关键词尝试变化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "中国500强_用户体验_url = \"https://www.liepin.com/zhaopin/?compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=180&sortFlag=15&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C&siTag=8iR9k8w0PtD9mZEhJ_u3lQ%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_prime&d_ckId=cf952c45036cc199a3928d9c14e390b7&d_curPage=0&d_pageSize=40&d_headId=cf952c45036cc199a3928d9c14e390b7\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['compkind=',\n",
       " 'dqs=',\n",
       " 'pubTime=',\n",
       " 'pageSize=40',\n",
       " 'salary=',\n",
       " 'compTag=180',\n",
       " 'sortFlag=15',\n",
       " 'compIds=',\n",
       " 'subIndustry=',\n",
       " 'jobKind=',\n",
       " 'industries=',\n",
       " 'compscale=',\n",
       " 'key=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C',\n",
       " 'siTag=8iR9k8w0PtD9mZEhJ_u3lQ%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       " 'd_sfrom=search_prime',\n",
       " 'd_ckId=cf952c45036cc199a3928d9c14e390b7',\n",
       " 'd_curPage=0',\n",
       " 'd_pageSize=40',\n",
       " 'd_headId=cf952c45036cc199a3928d9c14e390b7']"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "中国500强_用户体验_url_list = 中国500强_用户体验_url.split(\"?\")[1].split('&')\n",
    "中国500强_用户体验_url_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "中国500强_用户体验_S = pd.Series(中国500强_用户体验_url_list,name=\"中国500强_用户体验\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                                             compkind=\n",
       "1                                                  dqs=\n",
       "2                                              pubTime=\n",
       "3                                           pageSize=40\n",
       "4                                               salary=\n",
       "5                                           compTag=180\n",
       "6                                             sortFlag=\n",
       "7                                              compIds=\n",
       "8                                          subIndustry=\n",
       "9                                              jobKind=\n",
       "10                                          industries=\n",
       "11                                           compscale=\n",
       "12                                                 key=\n",
       "13    siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...\n",
       "14                               d_sfrom=search_unknown\n",
       "15              d_ckId=0c2410a36507b09206766cd10bec35ea\n",
       "16                                          d_curPage=0\n",
       "17                                        d_pageSize=40\n",
       "18            d_headId=0c2410a36507b09206766cd10bec35ea\n",
       "Name: 中国500强, dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "公司分类_df[\"中国500强\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.parse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'用户体验'"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "urllib.parse.unquote('%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C')  # key值重要"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>中国500强</th>\n",
       "      <th>中国500强_用户体验</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>compkind=</td>\n",
       "      <td>compkind=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>dqs=</td>\n",
       "      <td>dqs=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>pubTime=</td>\n",
       "      <td>pubTime=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>pageSize=40</td>\n",
       "      <td>pageSize=40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>salary=</td>\n",
       "      <td>salary=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>compTag=180</td>\n",
       "      <td>compTag=180</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>sortFlag=</td>\n",
       "      <td>sortFlag=15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>compIds=</td>\n",
       "      <td>compIds=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>subIndustry=</td>\n",
       "      <td>subIndustry=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>jobKind=</td>\n",
       "      <td>jobKind=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>industries=</td>\n",
       "      <td>industries=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>compscale=</td>\n",
       "      <td>compscale=</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>key=</td>\n",
       "      <td>key=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...</td>\n",
       "      <td>siTag=8iR9k8w0PtD9mZEhJ_u3lQ%7EfA9rXquZc5IkJpX...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>d_sfrom=search_unknown</td>\n",
       "      <td>d_sfrom=search_prime</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>d_ckId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_ckId=cf952c45036cc199a3928d9c14e390b7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>d_curPage=0</td>\n",
       "      <td>d_curPage=0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>d_pageSize=40</td>\n",
       "      <td>d_pageSize=40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>d_headId=0c2410a36507b09206766cd10bec35ea</td>\n",
       "      <td>d_headId=cf952c45036cc199a3928d9c14e390b7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               中国500强  \\\n",
       "0                                           compkind=   \n",
       "1                                                dqs=   \n",
       "2                                            pubTime=   \n",
       "3                                         pageSize=40   \n",
       "4                                             salary=   \n",
       "5                                         compTag=180   \n",
       "6                                           sortFlag=   \n",
       "7                                            compIds=   \n",
       "8                                        subIndustry=   \n",
       "9                                            jobKind=   \n",
       "10                                        industries=   \n",
       "11                                         compscale=   \n",
       "12                                               key=   \n",
       "13  siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpX...   \n",
       "14                             d_sfrom=search_unknown   \n",
       "15            d_ckId=0c2410a36507b09206766cd10bec35ea   \n",
       "16                                        d_curPage=0   \n",
       "17                                      d_pageSize=40   \n",
       "18          d_headId=0c2410a36507b09206766cd10bec35ea   \n",
       "\n",
       "                                          中国500强_用户体验  \n",
       "0                                           compkind=  \n",
       "1                                                dqs=  \n",
       "2                                            pubTime=  \n",
       "3                                         pageSize=40  \n",
       "4                                             salary=  \n",
       "5                                         compTag=180  \n",
       "6                                         sortFlag=15  \n",
       "7                                            compIds=  \n",
       "8                                        subIndustry=  \n",
       "9                                            jobKind=  \n",
       "10                                        industries=  \n",
       "11                                         compscale=  \n",
       "12           key=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C  \n",
       "13  siTag=8iR9k8w0PtD9mZEhJ_u3lQ%7EfA9rXquZc5IkJpX...  \n",
       "14                               d_sfrom=search_prime  \n",
       "15            d_ckId=cf952c45036cc199a3928d9c14e390b7  \n",
       "16                                        d_curPage=0  \n",
       "17                                      d_pageSize=40  \n",
       "18          d_headId=cf952c45036cc199a3928d9c14e390b7  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([公司分类_df[\"中国500强\"],中国500强_用户体验_S],axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 观察得 关键词参数为 —key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.parse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'用户体验'"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "urllib.parse.unquote(\"%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>中国500强</th>\n",
       "      <th>中国500强_用户体验</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>key=</td>\n",
       "      <td>key=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   中国500强                               中国500强_用户体验\n",
       "12   key=  key=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([公司分类_df[\"中国500强\"],中国500强_用户体验_S],axis=1)[12:13]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 创建参数模版生成器"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [],
   "source": [
    "a= [11,22,33,44,55]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [],
   "source": [
    "a[0]=22"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[22, 22, 33, 44, 55]"
      ]
     },
     "execution_count": 148,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 公司分类准备工作 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'中国500强': '180',\n",
       " '互联网300强': '188',\n",
       " '创新企业100': '185',\n",
       " '强制造业500强': '183',\n",
       " '专精特新企业': '176',\n",
       " '独角兽': '130'}"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compTag_参数构建"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 关键词准备工作2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C'"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "urllib.parse.quote(\"用户体验\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## （解析url链接）并创建url链接 准备工作3   urlparse解析  urlunparse构建"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "参数模版_list= ['https', 'www.liepin.com', '/zhaopin/', '', 'compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=180&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=e6b61b0f973d97002dd6056cac680da3&d_curPage=0&d_pageSize=40&d_headId=e6b61b0f973d97002dd6056cac680da3', ''] \n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'compkind': '',\n",
       " 'dqs': '',\n",
       " 'pubTime': '',\n",
       " 'pageSize': '40',\n",
       " 'salary': '',\n",
       " 'compTag': '180',\n",
       " 'sortFlag': '',\n",
       " 'compIds': '',\n",
       " 'subIndustry': '',\n",
       " 'jobKind': '',\n",
       " 'industries': '',\n",
       " 'compscale': '',\n",
       " 'key': '',\n",
       " 'siTag': '1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       " 'd_sfrom': 'search_unknown',\n",
       " 'd_ckId': 'e6b61b0f973d97002dd6056cac680da3',\n",
       " 'd_curPage': '0',\n",
       " 'd_pageSize': '40',\n",
       " 'd_headId': 'e6b61b0f973d97002dd6056cac680da3'}"
      ]
     },
     "execution_count": 140,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "参数模版 = urllib.parse.urlparse(公司分类[0])\n",
    "\n",
    "参数模版_list = pd.Series(参数模版).tolist()\n",
    "print(\"参数模版_list=\",参数模版_list,'\\n')\n",
    "\n",
    "# # 解析链接成列表模式，列表模式可改参数\n",
    "# url_ = urllib.parse.urlunparse(参数模版_list)\n",
    "# url_\n",
    "\n",
    "# 方案1：直接字符串操作，正则表达式\n",
    "\n",
    "\n",
    "# 档案2： compTag=180？ key:value?=>字典\n",
    "参数模版_dict = {i.split(\"=\")[0] : i.split(\"=\")[1] for i in 参数模版.query.split(\"&\")}  # .query -> 取query值\n",
    "# i.split(\"=\")[0] 为 key, i.split(\"=\")[1] 为 value\n",
    "参数模版_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "compTag_参数构建[\"中国500强\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [],
   "source": [
    "def 参数模板生成(compTag , keyword):\n",
    "    compTag_values = compTag_参数构建[compTag] # 例如：compTag_参数构建[\"中国500强\"] = 180 = compTag_values\n",
    "    keyword_values = urllib.parse.quote(keyword) # 例如：用户输入：用户体验  keyword_values = %E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C\n",
    "    参数模版_dict[\"compTag\"] = compTag_values\n",
    "    参数模版_dict[\"key\"] = keyword_values\n",
    "    \n",
    "    \n",
    "    \n",
    "    return (参数模版_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'compkind': '',\n",
       " 'dqs': '',\n",
       " 'pubTime': '',\n",
       " 'pageSize': '40',\n",
       " 'salary': '',\n",
       " 'compTag': '188',\n",
       " 'sortFlag': '',\n",
       " 'compIds': '',\n",
       " 'subIndustry': '',\n",
       " 'jobKind': '',\n",
       " 'industries': '',\n",
       " 'compscale': '',\n",
       " 'key': '%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C',\n",
       " 'siTag': '1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw',\n",
       " 'd_sfrom': 'search_unknown',\n",
       " 'd_ckId': 'e6b61b0f973d97002dd6056cac680da3',\n",
       " 'd_curPage': '0',\n",
       " 'd_pageSize': '40',\n",
       " 'd_headId': 'e6b61b0f973d97002dd6056cac680da3'}"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "互联网300强_用户体验 = 参数模板生成(\"互联网300强\",\"用户体验\")\n",
    "互联网300强_用户体验"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=188&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=e6b61b0f973d97002dd6056cac680da3&d_curPage=0&d_pageSize=40&d_headId=e6b61b0f973d97002dd6056cac680da3'"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 检验从参数模板的字典转换成 query的字符串形式\n",
    "query_互联网300强_用户体验 = \"&\".join([k+\"=\"+v for k,v in 互联网300强_用户体验.items()])\n",
    "query_互联网300强_用户体验"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "urllib.parse.urlunparse(参数模板_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [],
   "source": [
    "def url_参数模板生成(compTag , keyword):\n",
    "    compTag_values = compTag_参数构建[compTag]\n",
    "    keyword_values = urllib.parse.quote(keyword)\n",
    "    参数模版_dict[\"compTag\"] = compTag_values\n",
    "    参数模版_dict[\"key\"] = keyword_values\n",
    "    # 以上为参数模板生成器函数的代码，参考上面代码\n",
    "    query_互联网300强_用户体验 = \"&\".join([k+\"=\"+v for k,v in 参数模版_dict.items()])\n",
    "    # query_compTag_keyword\n",
    "    \n",
    "    # 下方代码写错，参数写错。\n",
    "    # 以下是生成url模板的代码 即使用方法：urlunparse()\n",
    "    参数模版_list[4] = query_互联网300强_用户体验 # query_compTag_keyword\n",
    "    url_参数 =  urllib.parse.urlunparse(参数模版_list)      \n",
    "    return (url_参数)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "url_参数模板生成('独角兽','数据产品经理')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'https://www.liepin.com/zhaopin/?compkind=&dqs=&pubTime=&pageSize=40&salary=&compTag=185&sortFlag=&compIds=&subIndustry=&jobKind=&industries=&compscale=&key=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=e6b61b0f973d97002dd6056cac680da3&d_curPage=0&d_pageSize=40&d_headId=e6b61b0f973d97002dd6056cac680da3'"
      ]
     },
     "execution_count": 152,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url_参数模板生成('创新企业100','用户体验')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 抓去页面数据（上周）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {},
   "outputs": [],
   "source": [
    "compTag = '创新企业100'\n",
    "keyword = '用户体验'\n",
    "url = url_参数模板生成(compTag,keyword)\n",
    "session = HTMLSession()\n",
    "r = session.get( url )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>职称</th>\n",
       "      <th>公司名称</th>\n",
       "      <th>时间</th>\n",
       "      <th>经验</th>\n",
       "      <th>薪水</th>\n",
       "      <th>链结</th>\n",
       "      <th>公司URL</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>招聘首席用户体验官</td>\n",
       "      <td>公司叮当快药(北京)科技有限公司</td>\n",
       "      <td>16小时前</td>\n",
       "      <td>大专及以上</td>\n",
       "      <td>30-50k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1935093403.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8449541/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>招聘UE设计师</td>\n",
       "      <td>公司数坤科技</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>统招本科</td>\n",
       "      <td>20-30k·16薪</td>\n",
       "      <td>https://www.liepin.com/job/1935731699.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9772955/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>招聘UE4视频设计师</td>\n",
       "      <td>公司银河航天(北京)科技有限公司</td>\n",
       "      <td>9小时前</td>\n",
       "      <td>统招本科</td>\n",
       "      <td>12-17k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1937316347.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9614836/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>招聘UX</td>\n",
       "      <td>公司上海栈略数据技术有限公司</td>\n",
       "      <td>前天</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1914550380.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8960751/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>招聘UX交互设计师（BOX）</td>\n",
       "      <td>公司编程猫</td>\n",
       "      <td>2021-04-07</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>15-20k·14薪</td>\n",
       "      <td>https://www.liepin.com/job/1923645151.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8632721/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>招聘用户体验总监</td>\n",
       "      <td>公司叮当快药(北京)科技有限公司</td>\n",
       "      <td>2021-04-06</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>20-30k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1937829855.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8449541/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>招聘UI/UE设计师(MVQP)</td>\n",
       "      <td>公司蓝箭航天空间科技股份有限公司</td>\n",
       "      <td>2021-04-06</td>\n",
       "      <td>本科及以上</td>\n",
       "      <td>25-35k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1935172641.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8557441/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>招聘高级UE设计师</td>\n",
       "      <td>公司希瑞亚斯</td>\n",
       "      <td>2021-03-29</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>25-50k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1915211816.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8624151/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>招聘资深UX设计师</td>\n",
       "      <td>公司希瑞亚斯</td>\n",
       "      <td>2021-03-29</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>25-50k·15薪</td>\n",
       "      <td>https://www.liepin.com/job/1930835043.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8624151/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>招聘体验课用户转化</td>\n",
       "      <td>公司编程猫</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>经验不限</td>\n",
       "      <td>8-13k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1936217347.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8632721/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>招聘ux用户体验设计师</td>\n",
       "      <td>公司车主邦(北京)科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>16-22k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1935951773.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9244569/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>招聘用户体验(UX)设计师</td>\n",
       "      <td>公司杭州观远数据有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>10-15k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1935790663.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9343188/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>招聘用户体验专家</td>\n",
       "      <td>公司叮当快药(北京)科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>25-50k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1935131597.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8449541/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>招聘用户体验高级管理</td>\n",
       "      <td>公司理想汽车</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>统招本科</td>\n",
       "      <td>25-50k·14薪</td>\n",
       "      <td>https://www.liepin.com/job/1930400733.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8572825/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>招聘用户体验高级管理</td>\n",
       "      <td>公司理想汽车</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>统招本科</td>\n",
       "      <td>25-50k·14薪</td>\n",
       "      <td>https://www.liepin.com/job/1930400731.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8572825/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>招聘高级交互设计师（UED/用户体验）</td>\n",
       "      <td>公司豌豆思维</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>经验不限</td>\n",
       "      <td>12-24k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1928901123.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9383814/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>招聘高级产品经理（用户体验方向）</td>\n",
       "      <td>公司水滴互助</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>统招本科</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1923095423.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8939472/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>招聘用户体验设计师</td>\n",
       "      <td>公司叮当快药(北京)科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>面议</td>\n",
       "      <td>https://www.liepin.com/job/1922956865.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8449541/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>招聘高级UED</td>\n",
       "      <td>公司豌豆思维</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>经验不限</td>\n",
       "      <td>11-22k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1928901149.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9383814/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>招聘UE4客户端开发工程师</td>\n",
       "      <td>公司寒武纪</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>本科及以上</td>\n",
       "      <td>20-30k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1929055879.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9320629/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>招聘UI/UE设计师</td>\n",
       "      <td>公司豌豆思维</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>13-25k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1935992545.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9383814/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>招聘UX/UI设计师</td>\n",
       "      <td>公司杭州观远数据有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>12-18k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1935790667.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9343188/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>招聘资深UE4引擎研发工程师</td>\n",
       "      <td>公司特斯联</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>25-35k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1935397711.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8757657/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>招聘UE4/U3D美术设计</td>\n",
       "      <td>公司深兰科技(上海)有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>面议</td>\n",
       "      <td>https://www.liepin.com/job/1923279971.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9496603/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>招聘UX/UI设计师 (MJ000553)</td>\n",
       "      <td>公司深圳追一科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>1-3年</td>\n",
       "      <td>15-25k·14薪</td>\n",
       "      <td>https://www.liepin.com/job/1934708693.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8848920/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>招聘资深交互设计师（UE） (MJ000047)</td>\n",
       "      <td>公司北京澎思科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>15-30k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1925599375.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9728303/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>招聘UED交互设计师（太原） (MJ000116)</td>\n",
       "      <td>公司北京澎思科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>5-10k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1929034415.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9728303/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>招聘医疗产品经理</td>\n",
       "      <td>公司数坤科技</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>18-30k·16薪</td>\n",
       "      <td>https://www.liepin.com/job/1937103025.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9772955/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>招聘医疗产品经理</td>\n",
       "      <td>公司数坤科技</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>统招本科</td>\n",
       "      <td>18-30k·16薪</td>\n",
       "      <td>https://www.liepin.com/job/1937102905.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9772955/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>招聘产品经理</td>\n",
       "      <td>公司数坤科技</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>15-25k·16薪</td>\n",
       "      <td>https://www.liepin.com/job/1936990233.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9772955/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>招聘产品经理</td>\n",
       "      <td>公司数坤科技</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>统招本科</td>\n",
       "      <td>15-25k·16薪</td>\n",
       "      <td>https://www.liepin.com/job/1935214983.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9772955/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>招聘交互设计师</td>\n",
       "      <td>公司升哲科技</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>20-35k·14薪</td>\n",
       "      <td>https://www.liepin.com/job/1937270737.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8434872/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>招聘交互设计师</td>\n",
       "      <td>公司数坤科技</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>本科及以上</td>\n",
       "      <td>20-30k·16薪</td>\n",
       "      <td>https://www.liepin.com/job/1937506977.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9772955/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>招聘Product Designer</td>\n",
       "      <td>公司空中云汇</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>25-50k·15薪</td>\n",
       "      <td>https://www.liepin.com/job/1925868547.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9394931/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>招聘UI设计师-To B大屏</td>\n",
       "      <td>公司升哲科技</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>20-40k·14薪</td>\n",
       "      <td>https://www.liepin.com/job/1934908325.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8434872/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>招聘资深UI设计师</td>\n",
       "      <td>公司升哲科技</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>5-10年</td>\n",
       "      <td>20-40k·14薪</td>\n",
       "      <td>https://www.liepin.com/job/1934903723.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8434872/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>招聘解决方案体验设计师</td>\n",
       "      <td>公司特斯联</td>\n",
       "      <td>10小时前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>20-30k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1937580837.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8757657/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>招聘UI Designer</td>\n",
       "      <td>公司空中云汇</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>3-5年</td>\n",
       "      <td>15-25k·15薪</td>\n",
       "      <td>https://www.liepin.com/job/1924717907.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9394931/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>招聘产品专家-泰州</td>\n",
       "      <td>公司理想汽车</td>\n",
       "      <td>13小时前</td>\n",
       "      <td>1-3年</td>\n",
       "      <td>7-12k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1938143801.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8572825/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>招聘前端开发工程师</td>\n",
       "      <td>公司特斯联</td>\n",
       "      <td>13小时前</td>\n",
       "      <td>经验不限</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "      <td>https://www.liepin.com/job/1937828887.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8757657/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           职称              公司名称          时间     经验  \\\n",
       "0                   招聘首席用户体验官  公司叮当快药(北京)科技有限公司       16小时前  大专及以上   \n",
       "1                     招聘UE设计师            公司数坤科技        6小时前   统招本科   \n",
       "2                  招聘UE4视频设计师  公司银河航天(北京)科技有限公司        9小时前   统招本科   \n",
       "3                        招聘UX    公司上海栈略数据技术有限公司          前天  5-10年   \n",
       "4              招聘UX交互设计师（BOX）             公司编程猫  2021-04-07   3-5年   \n",
       "5                    招聘用户体验总监  公司叮当快药(北京)科技有限公司  2021-04-06  5-10年   \n",
       "6            招聘UI/UE设计师(MVQP)  公司蓝箭航天空间科技股份有限公司  2021-04-06  本科及以上   \n",
       "7                   招聘高级UE设计师            公司希瑞亚斯  2021-03-29   3-5年   \n",
       "8                   招聘资深UX设计师            公司希瑞亚斯  2021-03-29   3-5年   \n",
       "9                   招聘体验课用户转化             公司编程猫        一个月前   经验不限   \n",
       "10                招聘ux用户体验设计师   公司车主邦(北京)科技有限公司        一个月前   3-5年   \n",
       "11              招聘用户体验(UX)设计师      公司杭州观远数据有限公司        一个月前   3-5年   \n",
       "12                   招聘用户体验专家  公司叮当快药(北京)科技有限公司        一个月前  5-10年   \n",
       "13                 招聘用户体验高级管理            公司理想汽车        一个月前   统招本科   \n",
       "14                 招聘用户体验高级管理            公司理想汽车        一个月前   统招本科   \n",
       "15        招聘高级交互设计师（UED/用户体验）            公司豌豆思维        一个月前   经验不限   \n",
       "16           招聘高级产品经理（用户体验方向）            公司水滴互助        一个月前   统招本科   \n",
       "17                  招聘用户体验设计师  公司叮当快药(北京)科技有限公司        一个月前  5-10年   \n",
       "18                    招聘高级UED            公司豌豆思维        一个月前   经验不限   \n",
       "19              招聘UE4客户端开发工程师             公司寒武纪        一个月前  本科及以上   \n",
       "20                 招聘UI/UE设计师            公司豌豆思维        一个月前  5-10年   \n",
       "21                 招聘UX/UI设计师      公司杭州观远数据有限公司        一个月前  5-10年   \n",
       "22             招聘资深UE4引擎研发工程师             公司特斯联        一个月前  5-10年   \n",
       "23              招聘UE4/U3D美术设计    公司深兰科技(上海)有限公司        一个月前  5-10年   \n",
       "24      招聘UX/UI设计师 (MJ000553)      公司深圳追一科技有限公司        一个月前   1-3年   \n",
       "25   招聘资深交互设计师（UE） (MJ000047)      公司北京澎思科技有限公司        一个月前   3-5年   \n",
       "26  招聘UED交互设计师（太原） (MJ000116)      公司北京澎思科技有限公司        一个月前   3-5年   \n",
       "27                   招聘医疗产品经理            公司数坤科技        6小时前   3-5年   \n",
       "28                   招聘医疗产品经理            公司数坤科技        6小时前   统招本科   \n",
       "29                     招聘产品经理            公司数坤科技        6小时前   3-5年   \n",
       "30                     招聘产品经理            公司数坤科技        6小时前   统招本科   \n",
       "31                    招聘交互设计师            公司升哲科技       11小时前   3-5年   \n",
       "32                    招聘交互设计师            公司数坤科技        6小时前  本科及以上   \n",
       "33         招聘Product Designer            公司空中云汇       11小时前  5-10年   \n",
       "34             招聘UI设计师-To B大屏            公司升哲科技       11小时前   3-5年   \n",
       "35                  招聘资深UI设计师            公司升哲科技       11小时前  5-10年   \n",
       "36                招聘解决方案体验设计师             公司特斯联       10小时前   3-5年   \n",
       "37              招聘UI Designer            公司空中云汇       11小时前   3-5年   \n",
       "38                  招聘产品专家-泰州            公司理想汽车       13小时前   1-3年   \n",
       "39                  招聘前端开发工程师             公司特斯联       13小时前   经验不限   \n",
       "\n",
       "            薪水                                           链结  \\\n",
       "0   30-50k·12薪  https://www.liepin.com/job/1935093403.shtml   \n",
       "1   20-30k·16薪  https://www.liepin.com/job/1935731699.shtml   \n",
       "2   12-17k·12薪  https://www.liepin.com/job/1937316347.shtml   \n",
       "3   15-25k·12薪  https://www.liepin.com/job/1914550380.shtml   \n",
       "4   15-20k·14薪  https://www.liepin.com/job/1923645151.shtml   \n",
       "5   20-30k·12薪  https://www.liepin.com/job/1937829855.shtml   \n",
       "6   25-35k·12薪  https://www.liepin.com/job/1935172641.shtml   \n",
       "7   25-50k·12薪  https://www.liepin.com/job/1915211816.shtml   \n",
       "8   25-50k·15薪  https://www.liepin.com/job/1930835043.shtml   \n",
       "9    8-13k·12薪  https://www.liepin.com/job/1936217347.shtml   \n",
       "10  16-22k·12薪  https://www.liepin.com/job/1935951773.shtml   \n",
       "11  10-15k·12薪  https://www.liepin.com/job/1935790663.shtml   \n",
       "12  25-50k·12薪  https://www.liepin.com/job/1935131597.shtml   \n",
       "13  25-50k·14薪  https://www.liepin.com/job/1930400733.shtml   \n",
       "14  25-50k·14薪  https://www.liepin.com/job/1930400731.shtml   \n",
       "15  12-24k·12薪  https://www.liepin.com/job/1928901123.shtml   \n",
       "16  15-25k·12薪  https://www.liepin.com/job/1923095423.shtml   \n",
       "17          面议  https://www.liepin.com/job/1922956865.shtml   \n",
       "18  11-22k·12薪  https://www.liepin.com/job/1928901149.shtml   \n",
       "19  20-30k·12薪  https://www.liepin.com/job/1929055879.shtml   \n",
       "20  13-25k·12薪  https://www.liepin.com/job/1935992545.shtml   \n",
       "21  12-18k·12薪  https://www.liepin.com/job/1935790667.shtml   \n",
       "22  25-35k·12薪  https://www.liepin.com/job/1935397711.shtml   \n",
       "23          面议  https://www.liepin.com/job/1923279971.shtml   \n",
       "24  15-25k·14薪  https://www.liepin.com/job/1934708693.shtml   \n",
       "25  15-30k·12薪  https://www.liepin.com/job/1925599375.shtml   \n",
       "26   5-10k·12薪  https://www.liepin.com/job/1929034415.shtml   \n",
       "27  18-30k·16薪  https://www.liepin.com/job/1937103025.shtml   \n",
       "28  18-30k·16薪  https://www.liepin.com/job/1937102905.shtml   \n",
       "29  15-25k·16薪  https://www.liepin.com/job/1936990233.shtml   \n",
       "30  15-25k·16薪  https://www.liepin.com/job/1935214983.shtml   \n",
       "31  20-35k·14薪  https://www.liepin.com/job/1937270737.shtml   \n",
       "32  20-30k·16薪  https://www.liepin.com/job/1937506977.shtml   \n",
       "33  25-50k·15薪  https://www.liepin.com/job/1925868547.shtml   \n",
       "34  20-40k·14薪  https://www.liepin.com/job/1934908325.shtml   \n",
       "35  20-40k·14薪  https://www.liepin.com/job/1934903723.shtml   \n",
       "36  20-30k·12薪  https://www.liepin.com/job/1937580837.shtml   \n",
       "37  15-25k·15薪  https://www.liepin.com/job/1924717907.shtml   \n",
       "38   7-12k·12薪  https://www.liepin.com/job/1938143801.shtml   \n",
       "39  10-20k·12薪  https://www.liepin.com/job/1937828887.shtml   \n",
       "\n",
       "                                      公司URL  \n",
       "0   https://www.liepin.com/company/8449541/  \n",
       "1   https://www.liepin.com/company/9772955/  \n",
       "2   https://www.liepin.com/company/9614836/  \n",
       "3   https://www.liepin.com/company/8960751/  \n",
       "4   https://www.liepin.com/company/8632721/  \n",
       "5   https://www.liepin.com/company/8449541/  \n",
       "6   https://www.liepin.com/company/8557441/  \n",
       "7   https://www.liepin.com/company/8624151/  \n",
       "8   https://www.liepin.com/company/8624151/  \n",
       "9   https://www.liepin.com/company/8632721/  \n",
       "10  https://www.liepin.com/company/9244569/  \n",
       "11  https://www.liepin.com/company/9343188/  \n",
       "12  https://www.liepin.com/company/8449541/  \n",
       "13  https://www.liepin.com/company/8572825/  \n",
       "14  https://www.liepin.com/company/8572825/  \n",
       "15  https://www.liepin.com/company/9383814/  \n",
       "16  https://www.liepin.com/company/8939472/  \n",
       "17  https://www.liepin.com/company/8449541/  \n",
       "18  https://www.liepin.com/company/9383814/  \n",
       "19  https://www.liepin.com/company/9320629/  \n",
       "20  https://www.liepin.com/company/9383814/  \n",
       "21  https://www.liepin.com/company/9343188/  \n",
       "22  https://www.liepin.com/company/8757657/  \n",
       "23  https://www.liepin.com/company/9496603/  \n",
       "24  https://www.liepin.com/company/8848920/  \n",
       "25  https://www.liepin.com/company/9728303/  \n",
       "26  https://www.liepin.com/company/9728303/  \n",
       "27  https://www.liepin.com/company/9772955/  \n",
       "28  https://www.liepin.com/company/9772955/  \n",
       "29  https://www.liepin.com/company/9772955/  \n",
       "30  https://www.liepin.com/company/9772955/  \n",
       "31  https://www.liepin.com/company/8434872/  \n",
       "32  https://www.liepin.com/company/9772955/  \n",
       "33  https://www.liepin.com/company/9394931/  \n",
       "34  https://www.liepin.com/company/8434872/  \n",
       "35  https://www.liepin.com/company/8434872/  \n",
       "36  https://www.liepin.com/company/8757657/  \n",
       "37  https://www.liepin.com/company/9394931/  \n",
       "38  https://www.liepin.com/company/8572825/  \n",
       "39  https://www.liepin.com/company/8757657/  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "dict_xpaths={ \n",
    "    'text': {\n",
    "        '时间':    '..//ul/li//time/text()', \n",
    "        '经验':      './/ul/li/div/div[contains(@class,\"job-info\")]/p/span[3]/text()',\n",
    "        '薪水':    './/ul/li/div/div[contains(@class,\"job-info\")]/p/span[contains(@class,\"text-warning\")]/text()'\n",
    "    },\n",
    "    'text_content': {\n",
    "    '职称':    './/ul/li/div/div[contains(@class,\"job-info\")]/h3/@title', \n",
    "#     '公司地点':'.//ul/li/div/div[contains(@class,\"job-info\")]/p/a/text()',?\n",
    "    '公司名称': './/ul/li//p[contains(@class,\"company-name\")]/a/@title', \n",
    "    \n",
    "    },\n",
    "    'href': {\n",
    "        '链结':    './/ul/li/div/div[contains(@class,\"job-info\")]/h3/a/@href', \n",
    "        '公司URL': './/ul/li//p[contains(@class,\"company-name\")]/a/@href', \n",
    "    }\n",
    "}\n",
    "\n",
    "主要元素 = r.html.xpath( \\\n",
    "    '//div[@class=\"job-content\"]/div[@class=\"sojob-result \"]')\n",
    "\n",
    "\n",
    "def get_content(_xpath_):\n",
    "    暂存结果 = [e.xpath(_xpath_) for e in 主要元素]\n",
    "    return(暂存结果)\n",
    "\n",
    "数据字典 = dict()\n",
    "\n",
    "数据字典 = {k:get_content(v)[0] for k,v in dict_xpaths['text_content'].items()}\n",
    "# print(数据字典)\n",
    "\n",
    "数据字典.update({k:get_content(v)[0] for k,v in dict_xpaths['text'].items()})\n",
    "# print(数据字典)\n",
    "数据字典.update({k:get_content(v)[0] for k,v in dict_xpaths['href'].items()})\n",
    "# print(数据字典)\n",
    "[len(v) for k,v in 数据字典.items()]\n",
    "\n",
    "数据 = pd.DataFrame(数据字典)\n",
    "# 数据.iloc[:,[0,1,2,3,4,5,6]]\n",
    "\n",
    "# mode = \"a\" 为表格的追加\n",
    "with pd.ExcelWriter(\"week07_linpin\",mode=\"a\",engine=\"openpyx1\") as writer:\n",
    "    数据.to_list(write,sheet_name=compTag+keyword)\n",
    "    \n",
    "    \n",
    "# 数据.to_excel(\"week07_linpin.xlsx\", sheet_name=compTag+keyword)\n",
    "display(数据)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6rc1"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "165px"
   },
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
